Reports in Informatics Relation Patterns and Their Automatic Discovery in Biosequences Relation Patterns and Their Automatic Discovery in Biosequences

نویسندگان

  • Inge Jonassen
  • Ingvar Eidhammer
  • Esko Ukkonen
چکیده

We have extended the pattern language used in PROSITE to enable it to describe dependencies between amino acid residues. We have developed a minimum description length principle based tness measure evaluating the signiicance of such patterns in relation to a set of sequences, and an algorithm automatically nding signiicant patterns in unaligned sequences. Computing experiments are reported showing that the method is able to discover residue dependences in many PROSITE families, as well as some reported earlier by another author as possibly resulting from structural constraints. The method has also demonstrated that many seemingly correlated amino acid positions may be resulting from diierent subfamilies being present in the family.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reports in Informatics Approaches to the Automatic Discovery of Patterns in Biosequences

Approaches to the automatic discovery of patterns in biosequences. Abstract This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classiication of pattern languages in this class is developed, covering those patterns which are the most frequently...

متن کامل

Approaches to the Automatic Discovery of Patterns in Biosequences

This paper surveys approaches to the discovery of patterns in biosequences and places these approaches within a formal framework that systematises the types of patterns and the discovery algorithms. Patterns with expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering the patterns that are the most frequen...

متن کامل

Discovering Patterns and Subfamilies in Biosequences

We consider the problem of automatic discovery of patterns and the corresponding subfamilies in a set of biosequences. The sequences are unaligned and may contain noise of unknown level. The patterns are of the type used in PROSITE database. In our approach we discover patterns and the respective subfamilies simultaneously. We develop a theoretically substantiated significance measure for a set...

متن کامل

Knowledge Discovery in Biosequences Using Sort Regular Patterns

This paper considers knowledge discovery by sort regular patterns, which are strings over sort letters representing nite sets of basic letters. We devise a learning algorithm for the class based on the minimal multiple generalization technique, and evaluate the method by experiments on biosequences from GenBank database. The experiments show that relatively a simple sort pattern can represent a...

متن کامل

Automatic Discovery of Technology Networks for Industrial-Scale R&D IT Projects via Data Mining

Industrial-Scale R&D IT Projects depend on many sub-technologies which need to be understood and have their risks analysed before the project can begin for their success. When planning such an industrial-scale project, the list of technologies and the associations of these technologies with each other is often complex and form a network. Discovery of this network of technologies is time consumi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997